Compute Seasonal TSS Medians without fancy ROS or bootstrapping

Load the data


In [1]:
import pandas

tss = pandas.read_csv("NSQD_Res_TSS.csv")

Compute the medians for each season without dropping duplicates


In [2]:
medians = (
    tss.groupby(by=['parameter', 'units', 'season'])
        .median()['res']
        .reset_index()
)

medians


Out[2]:
parameter units season res
0 Total Suspended Solids mg/L autumn 55
1 Total Suspended Solids mg/L spring 100
2 Total Suspended Solids mg/L summer 93
3 Total Suspended Solids mg/L winter 92

Compute the medians for each season after dropping duplicate records


In [3]:
index_cols = [
    'epa_rain_zone', 'location_code', 'station_name', 'primary_landuse',
    'start_date', 'season', 'station', 'parameter', 'units',
]

medians = (
    tss.groupby(by=index_cols)
        .first()
        .reset_index()
        .groupby(by=['parameter', 'units', 'season'])
        .median()['res']
        .reset_index()
)

medians


Out[3]:
parameter units season res
0 Total Suspended Solids mg/L autumn 53.0
1 Total Suspended Solids mg/L spring 100.0
2 Total Suspended Solids mg/L summer 94.2
3 Total Suspended Solids mg/L winter 92.0